\documentclass[a4paper,11pt]{article}
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{lmodern}
\usepackage{amsmath}
\usepackage{subfig}
\usepackage{graphicx}
\DeclareMathOperator{\argmax}{\arg\max}
\title{Intelligent Agents Assignment 1} 
\author{Behrooz Mahasseni}

\begin{document}
\maketitle
%\tableofcontents
\begin{abstract}
  The goal of this assignment is to implement a general planning algorithm for finding the optimum non-stationary policy for a general MDP. We tested our algorithm on a simple MDP with different configurations to show the correctness of our implementation.  
\end{abstract}
\section{Implementation}
We implement our algorithm in Java. In our implementation we have the following classes:
\begin{enumerate}
  \item State: This class stores the state id and the reward associated with the state. In addition it also stores the values of the $V^k_{\pi^*}$. In other words each state object will store the corresponding $V$ value for the $k^{th}$ left steps for the optimum policy. 
  \item Action: This class stores the id, name and transition probabilities associated with the activity.
  \item Policy: We have a policy interface which is parent class for Stationary and Non-Stationary policy classes. 
  \item NonStationaryPolicy: Is the class that implements "Policy" interface. It has a function which gives you the optimum action for give state and time.
  \item MDP: This class stores the input MDP. It has the set of states and actions. The bellman backup algorithm (value iteration) has been implemented in this class in valueIteration() function. This function creates and assigns the optimum policy to the MDP. 
\end{enumerate}
\section{MDP}
We defined our MDP based on a simple "Get the cheese game". The game has an $h \times w$ maze. Each cell may contain a cheese (which will last for ever but has a value). There is a single mouse in a maze and we want to find the best policy to guide the mouse in the maze. As soon as the mouse enters the cell with the cheese it will be rewarded by the value of the cheese. Since we have an infinite source of cheese in the cells contain cheese the mouse will be rewarded as longs as it stays. The following defines the MDP in a concrete way. 
\begin{itemize}
  \item States: The MDP has $n=h \times w$ states. The reason for this is because we only have a single mouse and it can be in one of the cells at each time. Since the change in the world is only defined by the place of the mouse and we have $h \times w$ different places. 
  \item Actions: We have four different actions for moving in four directions in the maze.
  \item Reward : Any state without cheese has a zero reward value. The amount of reward value for the cells with cheese varies. 
  \item Transition : The transition functions are defined such that with more than 50\% probability we get the desired state. But we might also end up with a non-desired states. 
\end{itemize}
\section{Experiments}
In our evaluation we use $5 \times 5$ maze with total of 25 states (note: we name states from 0 to 24). We evaluated our implementation in two different scenarios. The first scenario is the simplest possible case. We only have a single cell which contains cheese. The second scenario we have more than one cell with cheese and they have different values. Based on the horizon the mouse might end up going to different cells. In the following we will show the results of these two different scenarios. 
\subsection{First Scenario}
In this scenario we have only one cell (the cell at (2,4)) with cheese. We run the bellman backup (value iteration) with $h=5$ and $h=10$. Base on the outputs we can see that the MDP is actually finding the correct value and policy. Tables \ref{table1} show the value function and policy values (V=value,A=action). Consider state 14. If we only have one time step left it does not make a meaning full action but if we have 2 time steps left it will move up to get closer  to the cheese.  Another example is state 5. We can see that from state 5 if only have less than 3 steps the value is zero and the move right action is just a random action. But if we have 3 steps or more then we have a non-zero value with a meaningful action. 
\begin{table}
  \centering
  \tiny
  \begin{tabular} {|c|c|c|c|c|c|c|}
  \hline
  state&0&1&2&3&4&5\\\hline 
State0	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.5,A=right)&(V=.6,A=up)\\\hline
State1	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.6,A=right)&(V=.6,A=right)&(V=1.3,A=right)\\\hline
State2	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.7,A=right)&(V=.7,A=right)&(V=1.4,A=right)&(V=1.4,A=right)\\\hline
State3	&(V=.0,A=NOP)&(V=.7,A=down)&(V=.7,A=down)&(V=1.4,A=down)&(V=1.4,A=down)&(V=2.1,A=down)\\\hline
State4	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.7,A=left)&(V=.7,A=right)&(V=1.4,A=left)&(V=1.4,A=right)\\\hline
State5	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.3,A=right)&(V=.3,A=right)&(V=1.0,A=right)\\\hline
State6	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.5,A=right)&(V=.5,A=right)&(V=1.2,A=right)&(V=1.2,A=right)\\\hline
State7	&(V=.0,A=NOP)&(V=.7,A=right)&(V=.7,A=right)&(V=1.4,A=right)&(V=1.4,A=right)&(V=2.1,A=right)\\\hline
State8	&(V=1.0,A=NOP)&(V=1.0,A=right)&(V=1.7,A=right)&(V=1.7,A=right)&(V=2.4,A=right)&(V=2.4,A=right)\\\hline
State9	&(V=.0,A=NOP)&(V=.7,A=left)&(V=.7,A=left)&(V=1.4,A=left)&(V=1.4,A=left)&(V=2.1,A=left)\\\hline
State10	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.4,A=right)&(V=.4,A=right)\\\hline
State11	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.5,A=right)&(V=.5,A=right)&(V=1.2,A=right)\\\hline
State12	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.6,A=right)&(V=.6,A=right)&(V=1.3,A=right)&(V=1.3,A=right)\\\hline
State13	&(V=.0,A=NOP)&(V=.7,A=up)&(V=.7,A=up)&(V=1.4,A=up)&(V=1.4,A=up)&(V=2.1,A=up)\\\hline
State14	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.7,A=up)&(V=.7,A=up)&(V=1.4,A=up)&(V=1.4,A=up)\\\hline
State15	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.4,A=up)\\\hline
State16	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.5,A=right)&(V=.5,A=right)\\\hline
State17	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.5,A=up)&(V=.5,A=up)&(V=1.1,A=up)\\\hline
State18	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.5,A=up)&(V=.5,A=up)&(V=1.1,A=up)&(V=1.1,A=up)\\\hline
State19	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.6,A=up)&(V=.6,A=up)&(V=1.3,A=up)\\\hline
State20	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)\\\hline
State21	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.4,A=right)\\\hline
State22	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.4,A=right)&(V=.4,A=right)\\\hline
State23	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.3,A=up)&(V=.3,A=up)&(V=.9,A=up)\\\hline
State24	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.5,A=up)&(V=.6,A=right)\\\hline\end{tabular}
\caption{Value and Policy running with $h=5$  for scenario 1}
\label{table1}
\end{table}
Tables \ref{table2,table3} show value and policy function results for $h=10$ respectively. We observe that the values for $h<6$ are the same.
\begin{table}
  \tiny
  \centering
  \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
    \hline
      state&0&1&2&3&4&5&6&7&8&9&10\\\hline 
    State0	&(.0)&(.0)&(.0)&(.0)&(.5)&(.6)&(1.2)&(1.2)&(1.9)&(1.9)&(2.6)\\\hline
State1	&(.0)&(.0)&(.0)&(.6)&(.6)&(1.3)&(1.3)&(2.0)&(2.0)&(2.7)&(2.7)\\\hline
State2	&(.0)&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)\\\hline
State3	&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)&(3.5)\\\hline
State4	&(.0)&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)\\\hline
State5	&(.0)&(.0)&(.0)&(.3)&(.3)&(1.0)&(1.0)&(1.6)&(1.6)&(2.3)&(2.3)\\\hline
State6	&(.0)&(.0)&(.5)&(.5)&(1.2)&(1.2)&(1.8)&(1.8)&(2.5)&(2.5)&(3.2)\\\hline
State7	&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)&(3.5)\\\hline
State8	&(1.0)&(1.0)&(1.7)&(1.7)&(2.4)&(2.4)&(3.1)&(3.1)&(3.8)&(3.8)&(4.5)\\\hline
State9	&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)&(3.5)\\\hline
State10	&(.0)&(.0)&(.0)&(.0)&(.4)&(.4)&(1.1)&(1.1)&(1.7)&(1.7)&(2.4)\\\hline
State11	&(.0)&(.0)&(.0)&(.5)&(.5)&(1.2)&(1.2)&(1.9)&(1.9)&(2.6)&(2.6)\\\hline
State12	&(.0)&(.0)&(.6)&(.6)&(1.3)&(1.3)&(2.0)&(2.0)&(2.7)&(2.7)&(3.4)\\\hline
State13	&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)&(3.5)\\\hline
State14	&(.0)&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.1)&(2.8)&(2.8)&(3.5)\\\hline
State15	&(.0)&(.0)&(.0)&(.0)&(.0)&(.4)&(.4)&(1.1)&(1.1)&(1.7)&(1.7)\\\hline
State16	&(.0)&(.0)&(.0)&(.0)&(.5)&(.5)&(1.1)&(1.1)&(1.7)&(1.7)&(2.4)\\\hline
State17	&(.0)&(.0)&(.0)&(.5)&(.5)&(1.1)&(1.1)&(1.8)&(1.8)&(2.5)&(2.5)\\\hline
State18	&(.0)&(.0)&(.5)&(.5)&(1.1)&(1.1)&(1.8)&(1.8)&(2.5)&(2.5)&(3.2)\\\hline
State19	&(.0)&(.0)&(.0)&(.6)&(.6)&(1.3)&(1.3)&(2.0)&(2.0)&(2.7)&(2.7)\\\hline
State20	&(.0)&(.0)&(.0)&(.0)&(.0)&(.0)&(.4)&(.4)&(1.0)&(1.1)&(1.7)\\\hline
State21	&(.0)&(.0)&(.0)&(.0)&(.0)&(.4)&(.4)&(1.0)&(1.0)&(1.7)&(1.7)\\\hline
State22	&(.0)&(.0)&(.0)&(.0)&(.4)&(.4)&(1.0)&(1.0)&(1.6)&(1.6)&(2.3)\\\hline
State23	&(.0)&(.0)&(.0)&(.3)&(.3)&(.9)&(.9)&(1.6)&(1.6)&(2.3)&(2.3)\\\hline
State24	&(.0)&(.0)&(.0)&(.0)&(.5)&(.6)&(1.2)&(1.2)&(1.9)&(1.9)&(2.6)\\\hline
  \end{tabular}
  \caption{Value function result for $h=10$  for scenario 1}
  \label{table2}
\end{table}
\begin{table}
\tiny
  \centering
  \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
    \hline
      state&0&1&2&3&4&5&6&7&8&9&10\\\hline 
State0	&(NOP)&(right)&(right)&(right)&(right)&(up)&(right)&(up)&(right)&(up)&(right)\\\hline
State1	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State2	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State3	&(NOP)&(down)&(down)&(down)&(down)&(down)&(down)&(down)&(down)&(down)&(down)\\\hline
State4	&(NOP)&(right)&(left)&(right)&(left)&(right)&(left)&(right)&(left)&(right)&(left)\\\hline
State5	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State6	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State7	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State8	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(up)&(up)&(right)\\\hline
State9	&(NOP)&(left)&(left)&(left)&(left)&(left)&(left)&(left)&(left)&(left)&(left)\\\hline
State10	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State11	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State12	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State13	&(NOP)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State14	&(NOP)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State15	&(NOP)&(right)&(right)&(right)&(right)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State16	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State17	&(NOP)&(right)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State18	&(NOP)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State19	&(NOP)&(right)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State20	&(NOP)&(right)&(right)&(right)&(right)&(right)&(up)&(left)&(up)&(left)&(up)\\\hline
State21	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State22	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State23	&(NOP)&(right)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State24	&(NOP)&(right)&(right)&(right)&(up)&(right)&(up)&(right)&(up)&(right)&(up)\\\hline  \end{tabular}
  \caption{Policy function result for $h=10$  for scenario 1}
  \label{table3}
\end{table}
\subsection{Second Scenario}
In this scenario we put more than one cheese in the maze. This scenario will allow us to evaluate if our algorithm find the optimum policy based on the cheese position and number of steps left. We put cheese in State 9 and State 16 with values set to 3 and 1 respectively. Table \ref{table4} shows the result for running the bellman backup for $h=5$. Considering state 6 when we only have 2 steps left the policy select the move down operation because it will be able to reach the cheese in one more step. But if it has 3 steps left it will choose to go right and reach the other cheese which has a higher reward value. 
\begin{table}
  \centering
  \tiny
  \begin{tabular} {|c|c|c|c|c|c|c|}
  \hline
  state&0&1&2&3&4&5\\\hline 
State0	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.0,A=right)&(V=.5,A=down)&(V=1.4,A=right)\\\hline
State1	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.3,A=down)&(V=1.7,A=right)&(V=2.9,A=right)\\\hline
State2	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=1.9,A=right)&(V=3.3,A=right)&(V=5.3,A=right)\\\hline
State3	&(V=.0,A=NOP)&(V=.0,A=right)&(V=2.1,A=right)&(V=3.6,A=right)&(V=5.7,A=right)&(V=7.4,A=right)\\\hline
State4	&(V=.0,A=NOP)&(V=2.1,A=down)&(V=3.6,A=down)&(V=5.7,A=down)&(V=7.4,A=down)&(V=9.4,A=down)\\\hline
State5	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.6,A=down)&(V=.8,A=right)&(V=1.7,A=right)\\\hline
State6	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.5,A=down)&(V=1.1,A=right)&(V=2.0,A=right)&(V=3.6,A=right)\\\hline
State7	&(V=.0,A=NOP)&(V=.0,A=right)&(V=1.5,A=right)&(V=2.6,A=right)&(V=4.5,A=right)&(V=6.1,A=right)\\\hline
State8	&(V=.0,A=NOP)&(V=2.1,A=right)&(V=3.6,A=right)&(V=5.6,A=right)&(V=7.4,A=right)&(V=9.4,A=right)\\\hline
State9	&(V=3.0,A=NOP)&(V=5.1,A=right)&(V=7.2,A=right)&(V=9.1,A=right)&(V=11.1,A=right)&(V=13.0,A=right)\\\hline
State10	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.7,A=down)&(V=.7,A=down)&(V=1.4,A=down)&(V=1.5,A=down)\\\hline
State11	&(V=.0,A=NOP)&(V=.7,A=down)&(V=.7,A=down)&(V=1.4,A=down)&(V=1.6,A=right)&(V=2.7,A=right)\\\hline
State12	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.6,A=down)&(V=1.7,A=right)&(V=2.9,A=right)&(V=4.7,A=right)\\\hline
State13	&(V=.0,A=NOP)&(V=.0,A=right)&(V=1.9,A=right)&(V=3.3,A=right)&(V=5.2,A=right)&(V=6.9,A=right)\\\hline
State14	&(V=.0,A=NOP)&(V=2.1,A=up)&(V=3.6,A=up)&(V=5.6,A=up)&(V=7.4,A=up)&(V=9.3,A=up)\\\hline
State15	&(V=.0,A=NOP)&(V=.7,A=right)&(V=.7,A=right)&(V=1.4,A=right)&(V=1.4,A=right)&(V=2.1,A=right)\\\hline
State16	&(V=1.0,A=NOP)&(V=1.0,A=right)&(V=1.7,A=right)&(V=1.7,A=right)&(V=2.4,A=left)&(V=2.6,A=right)\\\hline
State17	&(V=.0,A=NOP)&(V=.7,A=left)&(V=.7,A=left)&(V=1.4,A=left)&(V=1.7,A=up)&(V=2.8,A=up)\\\hline
State18	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.5,A=left)&(V=1.6,A=up)&(V=2.8,A=up)&(V=4.4,A=up)\\\hline
State19	&(V=.0,A=NOP)&(V=.0,A=right)&(V=1.5,A=up)&(V=2.6,A=up)&(V=4.4,A=up)&(V=6.0,A=up)\\\hline
State20	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.7,A=right)&(V=.7,A=right)&(V=1.4,A=right)&(V=1.4,A=right)\\\hline
State21	&(V=.0,A=NOP)&(V=.7,A=up)&(V=.7,A=up)&(V=1.4,A=up)&(V=1.4,A=up)&(V=2.1,A=up)\\\hline
State22	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.7,A=left)&(V=.7,A=left)&(V=1.4,A=left)&(V=1.6,A=up)\\\hline
State23	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=.6,A=left)&(V=1.4,A=up)&(V=2.5,A=up)\\\hline
State24	&(V=.0,A=NOP)&(V=.0,A=right)&(V=.0,A=right)&(V=1.0,A=up)&(V=2.0,A=up)&(V=3.5,A=up)\\\hline
\end{tabular}
\caption{Value and Policy running with $h=5$ for scenario 2}
\label{table4}
\end{table}
Tables \ref{table5, table6} show value and policy function results for $h=10$ respectively. Similar to scenario 1 we observe that the values for $h<6$ are the same.
\begin{table}
\tiny
  \centering
  \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
    \hline
      state&0&1&2&3&4&5&6&7&8&9&10\\\hline 
State0	&(.0)&(.0)&(.0)&(.0)&(.5)&(1.4)&(2.5)&(4.3)&(5.8)&(7.8)&(9.6)\\\hline
State1	&(.0)&(.0)&(.0)&(.3)&(1.7)&(2.9)&(4.8)&(6.5)&(8.5)&(10.3)&(12.2)\\\hline
State2	&(.0)&(.0)&(.0)&(1.9)&(3.3)&(5.3)&(7.0)&(9.0)&(10.9)&(12.8)&(14.7)\\\hline
State3	&(.0)&(.0)&(2.1)&(3.6)&(5.7)&(7.4)&(9.4)&(11.3)&(13.2)&(15.1)&(17.1)\\\hline
State4	&(.0)&(2.1)&(3.6)&(5.7)&(7.4)&(9.4)&(11.3)&(13.3)&(15.2)&(17.1)&(19.0)\\\hline
State5	&(.0)&(.0)&(.0)&(.6)&(.8)&(1.7)&(3.0)&(4.3)&(6.2)&(7.9)&(9.8)\\\hline
State6	&(.0)&(.0)&(.5)&(1.1)&(2.0)&(3.6)&(5.1)&(7.1)&(8.8)&(10.8)&(12.6)\\\hline
State7	&(.0)&(.0)&(1.5)&(2.6)&(4.5)&(6.1)&(8.1)&(9.9)&(11.9)&(13.7)&(15.7)\\\hline
State8	&(.0)&(2.1)&(3.6)&(5.6)&(7.4)&(9.4)&(11.3)&(13.2)&(15.1)&(17.0)&(18.9)\\\hline
State9	&(3.0)&(5.1)&(7.2)&(9.1)&(11.1)&(13.0)&(14.9)&(16.8)&(18.7)&(20.6)&(22.6)\\\hline
State10	&(.0)&(.0)&(.7)&(.7)&(1.4)&(1.5)&(2.4)&(3.8)&(5.3)&(7.1)&(8.8)\\\hline
State11	&(.0)&(.7)&(.7)&(1.4)&(1.6)&(2.7)&(4.3)&(5.8)&(7.7)&(9.4)&(11.3)\\\hline
State12	&(.0)&(.0)&(.6)&(1.7)&(2.9)&(4.7)&(6.4)&(8.2)&(10.0)&(11.9)&(13.8)\\\hline
State13	&(.0)&(.0)&(1.9)&(3.3)&(5.2)&(6.9)&(8.8)&(10.7)&(12.6)&(14.5)&(16.4)\\\hline
State14	&(.0)&(2.1)&(3.6)&(5.6)&(7.4)&(9.3)&(11.2)&(13.1)&(15.0)&(16.9)&(18.8)\\\hline
State15	&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.3)&(3.3)&(4.5)&(6.0)&(7.7)\\\hline
State16	&(1.0)&(1.0)&(1.7)&(1.7)&(2.4)&(2.6)&(3.7)&(5.1)&(6.6)&(8.3)&(10.0)\\\hline
State17	&(.0)&(.7)&(.7)&(1.4)&(1.7)&(2.8)&(4.3)&(5.8)&(7.6)&(9.3)&(11.2)\\\hline
State18	&(.0)&(.0)&(.5)&(1.6)&(2.8)&(4.4)&(6.0)&(7.8)&(9.6)&(11.5)&(13.3)\\\hline
State19	&(.0)&(.0)&(1.5)&(2.6)&(4.4)&(6.0)&(7.8)&(9.6)&(11.5)&(13.4)&(15.3)\\\hline
State20	&(.0)&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.3)&(3.3)&(4.5)&(5.9)\\\hline
State21	&(.0)&(.7)&(.7)&(1.4)&(1.4)&(2.1)&(2.3)&(3.3)&(4.4)&(5.8)&(7.5)\\\hline
State22	&(.0)&(.0)&(.7)&(.7)&(1.4)&(1.6)&(2.6)&(3.9)&(5.4)&(7.2)&(8.9)\\\hline
State23	&(.0)&(.0)&(.0)&(.6)&(1.4)&(2.5)&(3.8)&(5.3)&(7.0)&(8.7)&(10.5)\\\hline
State24	&(.0)&(.0)&(.0)&(1.0)&(2.0)&(3.5)&(4.9)&(6.6)&(8.3)&(10.2)&(12.0)\\\hline
  \end{tabular}
  \caption{Value function result for $h=10$  for scenario 2}
  \label{table5}
\end{table}
\begin{table}
\tiny
  \centering
  \begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|}
    \hline
      state&0&1&2&3&4&5&6&7&8&9&10\\\hline 
State0	&(NOP)&(right)&(right)&(right)&(down)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State1	&(NOP)&(right)&(right)&(down)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State2	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State3	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State4	&(NOP)&(down)&(down)&(down)&(down)&(down)&(down)&(down)&(down)&(down)&(down)\\\hline
State5	&(NOP)&(right)&(right)&(down)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State6	&(NOP)&(right)&(down)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State7	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State8	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State9	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State10	&(NOP)&(right)&(down)&(down)&(down)&(down)&(right)&(right)&(right)&(right)&(right)\\\hline
State11	&(NOP)&(down)&(down)&(down)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State12	&(NOP)&(right)&(down)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State13	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State14	&(NOP)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State15	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State16	&(NOP)&(right)&(right)&(right)&(left)&(right)&(right)&(right)&(right)&(right)&(right)\\\hline
State17	&(NOP)&(left)&(left)&(left)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State18	&(NOP)&(right)&(left)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State19	&(NOP)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
State20	&(NOP)&(right)&(right)&(right)&(right)&(right)&(right)&(right)&(up)&(up)&(up)\\\hline
State21	&(NOP)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(right)\\\hline
State22	&(NOP)&(right)&(left)&(left)&(left)&(up)&(up)&(right)&(right)&(right)&(right)\\\hline
State23	&(NOP)&(right)&(right)&(left)&(up)&(up)&(right)&(right)&(right)&(right)&(right)\\\hline
State24	&(NOP)&(right)&(right)&(up)&(up)&(up)&(up)&(up)&(up)&(up)&(up)\\\hline
 \end{tabular}
  \caption{Policy function result for $h=10$  for scenario 2}
  \label{table6}
\end{table}
\end{document}
